Introduction
Azure Databricks has built-in support for data visualizations in notebooks. In this poste, I will guide you how to query and visualize data in a Databricks notebook.
Requirement
You must have a Unity Catalog enable. If you have not yet had, see how to Create Azure Databricks Workspace. A workspace created successfully enable Unity Catalog.
You must have permission to use an existing compute resource or create a new compute resource. If you have not yet had, see Create Sample Azure SQL Server and Database
How-to Guide
Step 1: Create a new notebook
Click on ✙ in the sidebar of your Databricks workspace, then select Notebook. A blank notebook opens in the workspace
Step 2: Query a table
Databricks notebook allow you to develop multiple languages: SQL, Python, Scala or R. Pay attention to connect to your compute resource before starting query
In order to query the samples.nyctaxi.trips table in Unity Catalog using the language of your choice: - Copy and paste the code corresponding to your language in a new empty notebook cell:
SQL
SELECT * FROM samples.nyctaxi.tripsPython
display(spark.read.table("samples.nyctaxi.trips"))Scala
display(spark.read.table("samples.nyctaxi.trips"))R
library(SparkR)
display(sql("SELECT*FROM samples.nyctaxi.trips"))- Press Enter to executive the code then move to the next cell.
Step 3. Data Visualization
The objective is display a bar chart on the average fare amount by trip distance, grouped by the pickup zip code. Thus, the configuration of Visualization should be as following:
In the Visualization Type, select Bar.
For the X column, select fare_amount.
For the Y column, select trip_distance. For the aggregation type, select Average.
For Group by, select pickup_zip.
Click Save
Further posibilities:
After visualizing successfully, you can click on the drop-list of Visualizations to have further options (e.g. Download, Rename, Add to dashboard…)
For a hands-on practice, watch my step-by-step tutorial video demonstrating the above process: